AITopics | weighted average

Collaborating Authors

weighted average

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Pool Me Wisely: On the Effect of Pooling in Transformer-Based Models

Neural Information Processing SystemsJun-16-2026, 17:45:26 GMT

Transformer models have become the dominant backbone for sequence modeling, leveraging self-attention to produce contextualized token representations. These are typically aggregated into fixed-size vectors via pooling operations for downstream tasks. While much of the literature has focused on attention mechanisms, the role of pooling remains underexplored despite its critical impact on model behavior. In this paper, we introduce a theoretical framework that rigorously characterizes the expressivity of Transformer-based models equipped with widely used pooling methods by deriving closed-form bounds on their representational capacity and the ability to distinguish similar inputs. Our analysis extends to different variations of attention formulations, demonstrating that these bounds hold across diverse architectural variants. We empirically evaluate pooling strategies across tasks requiring both global and local contextual understanding, spanning three major modalities: computer vision, natural language processing, and time-series analysis. Results reveal consistent trends in how pooling choices affect accuracy, sensitivity, and optimization behavior. Our findings unify theoretical and empirical perspectives, providing practical guidance for selecting or designing pooling mechanisms suited to specific tasks. This work positions pooling as a key architectural component in Transformer models and lays the foundation for more principled model design beyond attention alone.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Synthetic Design: An Optimization Approach to Experimental Design with Synthetic Controls

Neural Information Processing SystemsApr-25-2026, 17:45:57 GMT

We investigate the optimal design of experimental studies that have pre-treatment outcome data available. The average treatment effect is estimated as the difference between the weighted average outcomes of the treated and control units. A number of commonly used approaches fit this formulation, including the difference-inmeans estimator and a variety of synthetic-control techniques. We propose several methods for choosing the set of treated units in conjunction with the weights. Observing the NP-hardness of the problem, we introduce a mixed-integer programming formulation which selects both the treatment and control sets and unit weightings. We prove that these proposed approaches lead to qualitatively different experimental units being selected for treatment. We use simulations based on publicly available data from the USBureau of Labor Statistics that show improvements in terms of mean squared error and statistical power when compared to simple and commonly used alternatives such as randomized trials.

artificial intelligence, optimization problem, treatment effect, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California > Santa Clara County (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.90)

Industry: Banking & Finance > Economy (0.66)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.65)

Add feedback

Adaptive Averaging in Accelerated Descent Dynamics

Walid Krichene, Alexandre Bayen, Peter L. Bartlett

Neural Information Processing SystemsMar-23-2026, 02:33:53 GMT

We study accelerated descent dynamics for constrained convex optimization. This dynamics can be described naturally as a coupling of a dual variable accumulating gradients at a given rate η(t), and a primal variable obtained as the weighted average of the mirrored dual trajectory, with weights w(t). Using a Lyapunov argument, we give sufficient conditions on η and wto achieve a desired convergence rate. As an example, we show that the replicator dynamics (an example of mirror descent on the simplex) can be accelerated using a simple averaging scheme. We then propose an adaptive averaging heuristic which adaptively computes the weights to speed up the decrease of the Lyapunov function. We provide guarantees on adaptive averaging in continuous-time, prove that it preserves the quadratic convergence rate of accelerated first-order methods in discrete-time, and give numerical experiments to compare it with existing heuristics, such as adaptive restarting. The experiments indicate that adaptive averaging performs at least as well as adaptive restarting, with significant improvements in some cases.

artificial intelligence, machine learning, trajectory, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

Add feedback

Mechanism Design for Collaborative Normal Mean Estimation

Neural Information Processing SystemsFeb-16-2026, 02:31:39 GMT

However, simply pooling everyone's data and sharing with each other can lead to free-riding [

agent, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > Spain > Galicia > Madrid (0.04)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)
Europe > France (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Health & Medicine > Therapeutic Area (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.69)

Add feedback

48d23e87eb98cc2227b5a8c33fa00680-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 12:07:30 GMT

artificial intelligence, machine learning, treatment effect, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Spain > Basque Country (0.04)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry:

Government (0.47)
Banking & Finance > Economy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

4e668929edb3bf915e1a3a9d96c3c97e-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-8-2026, 09:27:32 GMT

interaction, latent variable, latent-variable structure, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.50)

Add feedback

Asymptotic normality and confidence intervals for derivatives of 2-layers neural network in the random features model

Neural Information Processing SystemsDec-24-2025, 17:30:25 GMT

This paper studies two-layers Neural Networks (NN), where the first layer contains random weights, and the second layer is trained using Ridge regularization. This model has been the focus of numerous recent works, showing that despite its simplicity, it captures some of the empirically observed behaviors of NN in the overparametrized regime, such as the double-descent curve where the generalization error decreases as the number of weights increases to $+\infty$. This paper establishes asymptotic distribution results for this 2-layers NN model in the regime where the ratios $\frac p n$ and $\frac d n$ have finite limits, where $n$ is the sample size, $p$ the ambient dimension and $d$ is the width of the first layer. We show that a weighted average of the derivatives of the trained NN at the observed data is asymptotically normal, in a setting with Lipschitz activation functions in a linear regression response with Gaussian features under possibly non-linear perturbations. We then leverage this asymptotic normality result to construct confidence intervals (CIs) for single components of the unknown regression vector. The novelty of our results are threefold: (1) Despite the nonlinearity induced by the activation function, we characterize the asymptotic distribution of a weighted average of the gradients of the network after training; (2) It provides the first frequentist uncertainty quantification guarantees, in the form of valid ($1\text{-}\alpha$)-CIs, based on NN estimates; (3) It shows that the double-descent phenomenon occurs in terms of the length of the CIs, with the length increasing and then decreasing as $\frac d n\nearrow +\infty$ for certain fixed values of $\frac p n$. We also provide a toolbox to predict the length of CIs numerically, which lets us compare activation functions and other parameters in terms of CI length.

2-layer neural network, asymptotic normality and confidence interval, derivative, (8 more...)

Neural Information Processing Systems

Genre: Research Report (0.58)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Goal inference with Rao-Blackwellized Particle Filters

Wang, Yixuan, Guralnik, Dan P., Dixon, Warren E.

arXiv.org Artificial IntelligenceDec-11-2025

Inferring the eventual goal of a mobile agent from noisy observations of its trajectory is a fundamental estimation problem. We initiate the study of such intent inference using a variant of a Rao-Blackwellized Particle Filter (RBPF), subject to the assumption that the agent's intent manifests through closed-loop behavior with a state-of-the-art provable practical stability property. Leveraging the assumed closed-form agent dynamics, the RBPF analytically marginalizes the linear-Gaussian substructure and updates particle weights only, improving sample efficiency over a standard particle filter. Two difference estimators are introduced: a Gaussian mixture model using the RBPF weights and a reduced version confining the mixture to the effective sample. We quantify how well the adversary can recover the agent's intent using information-theoretic leakage metrics and provide computable lower bounds on the Kullback-Leibler (KL) divergence between the true intent distribution and RBPF estimates via Gaussian-mixture KL bounds. We also provide a bound on the difference in performance between the two estimators, highlighting the fact that the reduced estimator performs almost as well as the complete one. Experiments illustrate fast and accurate intent recovery for compliant agents, motivating future work on designing intent-obfuscating controllers.

artificial intelligence, machine learning, particle, (17 more...)

arXiv.org Artificial Intelligence

2512.09269

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.81)

Add feedback

Exploring Vulnerability in AI Industry

Pirrone, Claudio, Fricano, Stefano, Fazio, Gioacchino

arXiv.org Artificial IntelligenceOct-28-2025

The rapid ascent of Foundation Models (FMs), enabled by the Transformer architecture, drives the current AI ecosystem. Characterized by large-scale training and downstream adaptability, FMs (as GPT family) have achieved massive public adoption, fueling a turbulent market shaped by platform economics and intense investment. Assessing the vulnerability of this fast-evolving industry is critical yet challenging due to data limitations. This paper proposes a synthetic AI Vulnerability Index (AIVI) focusing on the upstream value chain for FM production, prioritizing publicly available data. We model FM output as a function of five inputs: Compute, Data, Talent, Capital, and Energy, hypothesizing that supply vulnerability in any input threatens the industry. Key vulnerabilities include compute concentration, data scarcity and legal risks, talent bottlenecks, capital intensity and strategic dependencies, as well as escalating energy demands. Acknowledging imperfect input substitutability, we propose a weighted geometrical average of aggregate subindexes, normalized using theoretical or empirical benchmarks. Despite limitations and room for improvement, this preliminary index aims to quantify systemic risks in AI's core production engine, and implicitly shed a light on the risks for downstream value chain.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.23421

Country: North America > United States (1.00)

Genre: Research Report (0.40)

Industry:

Law (1.00)
Banking & Finance (0.94)
Government > Regional Government (0.46)
Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

Add feedback

MergeMoE: Efficient Compression of MoE Models via Expert Output Merging

Miao, Ruijie, Yao, Yilun, Wang, Zihan, Wang, Zhiming, Yi, Bairen, Liu, LingJun, Zhao, Yikai, Yang, Tong

arXiv.org Artificial IntelligenceOct-17-2025

The Mixture-of-Experts (MoE) technique has proven to be a promising solution to efficiently scale the model size, which has been widely applied in recent LLM advancements. However, the substantial memory overhead of MoE models has made their compression an important research direction. In this work, we provide a theoretical analysis of expert merging, a recently proposed technique for compressing MoE models. Rather than interpreting expert merging from the conventional perspective of parameter aggregation, we approach it from the perspective of merging experts' outputs. Our key insight is that the merging process can be interpreted as inserting additional matrices into the forward computation, which naturally leads to an optimization formulation. Building on this analysis, we introduce MergeMoE, a method that leverages mathematical optimization to construct the compression matrices. We evaluate MergeMoE on multiple MoE models and show that our algorithm consistently outperforms the baselines with the same compression ratios.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.14436

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback